board position
Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models
Junkin, Jared, Nathanson, Samuel
Language models are traditionally designed around causal masking. In domains with spatial or relational structure, causal masking is often viewed as inappropriate, and sequential linearizations are instead used. Yet the question of whether it is viable to accept the information loss introduced by causal masking on nonsequential data has received little direct study, in part because few domains offer both spatial and sequential representations of the same dataset. In this work, we investigate this issue in the domain of chess, which naturally supports both representations. We train language models with bidirectional and causal self-attention mechanisms on both spatial (board-based) and sequential (move-based) data. Our results show that models trained on spatial board states - \textit{even with causal masking} - consistently achieve stronger playing strength than models trained on sequential data. While our experiments are conducted on chess, our results are methodological and may have broader implications: applying causal masking to spatial data is a viable procedure for training unimodal LLMs on spatial data, and in some domains is even preferable to sequentialization.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Programmatic Representation Learning with Language Models
Poesia, Gabriel, Sampaio, Georgia Gabriela
Classical models for supervised machine learning, such as decision trees, are efficient and interpretable predictors, but their quality is highly dependent on the particular choice of input features. Although neural networks can learn useful representations directly from raw data (e.g., images or text), this comes at the expense of interpretability and the need for specialized hardware to run them efficiently. In this paper, we explore a hypothesis class we call Learned Programmatic Representations (LeaPR) models, which stack arbitrary features represented as code (functions from data points to scalars) and decision tree predictors. We synthesize feature functions using Large Language Models (LLMs), which have rich prior knowledge in a wide range of domains and a remarkable ability to write code using existing domain-specific libraries. We propose two algorithms to learn LeaPR models from supervised data. First, we design an adaptation of FunSearch to learn features rather than directly generate predictors. Then, we develop a novel variant of the classical ID3 algorithm for decision tree learning, where new features are generated on demand when splitting leaf nodes. In experiments from chess position evaluation to image and text classification, our methods learn high-quality, neural network-free predictors often competitive with neural networks. Our work suggests a flexible paradigm for learning interpretable representations end-to-end where features and predictions can be readily inspected and understood.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Natural Language Reinforcement Learning
Feng, Xidong, Wan, Ziyu, Fu, Haotian, Liu, Bo, Yang, Mengyue, Koushik, Girish A., Hu, Zhiyuan, Wen, Ying, Wang, Jun
Reinforcement Learning (RL) mathematically formulates decision-making with Markov Decision Process (MDP). With MDPs, researchers have achieved remarkable breakthroughs across various domains, including games, robotics, and language models. This paper seeks a new possibility, Natural Language Reinforcement Learning (NLRL), by extending traditional MDP to natural language-based representation space. Specifically, NLRL innovatively redefines RL principles, including task objectives, policy, value function, Bellman equation, and policy iteration, into their language counterparts. With recent advancements in large language models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value improvement by either pure prompting or gradient-based training. Experiments over Maze, Breakthrough, and Tic-Tac-Toe games demonstrate the effectiveness, efficiency, and interpretability of the NLRL framework among diverse use cases. Our code will be released at https://github.com/waterhorse1/Natural-language-RL.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Explore the Reasoning Capability of LLMs in the Chess Testbed
Wang, Shu, Ji, Lei, Wang, Renxi, Zhao, Wenxiao, Liu, Haokun, Hou, Yifan, Wu, Ying Nian
Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting better chess moves. Our experiments show that our models perform better than GPT, Claude, and Gemini models. We find that language explanations can enhance the reasoning capability of large language models.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Semi-Strongly solved: a New Definition Leading Computer to Perfect Gameplay
Solving combinatorial games has been a classic research topic in artificial intelligence because solutions can offer essential information to improve gameplay. Several definitions exist for `solving the game,' but they are markedly different regarding computational cost and the detail of insights derived. In this study, we introduce a novel definition called `semi-strongly solved' and propose an algorithm to achieve this type of solution efficiently. This new definition addresses existing gaps because of its intermediate computational cost and the quality of the solution. To demonstrate the potential of our approach, we derive the theoretical computational complexity of our algorithm under a simple condition, and apply it to semi-strongly solve the game of 6x6 Othello. This study raises many new research goals in this research area.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > New York > New York County > New York City (0.04)
Impartial Games: A Challenge for Reinforcement Learning
While AlphaZero-style reinforcement learning (RL) algorithms excel in various board games, in this paper we show that they face challenges on impartial games where players share pieces. We present a concrete example of a game - namely the children's game of Nim - and other impartial games that seem to be a stumbling block for AlphaZero-style and similar self-play reinforcement learning algorithms. Our work is built on the challenges posed by the intricacies of data distribution on the ability of neural networks to learn parity functions, exacerbated by the noisy labels issue. Our findings are consistent with recent studies showing that AlphaZero-style algorithms are vulnerable to adversarial attacks and adversarial perturbations, showing the difficulty of learning to master the games in all legal states. We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases. Intuitively, the difference between impartial games like Nim and partisan games like Chess and Go can be explained by the fact that if a small part of the board is covered for impartial games it is typically not possible to predict whether the position is won or lost as there is often zero correlation between the visible part of a partly blanked-out position and its correct evaluation. This situation starkly contrasts partisan games where a partly blanked-out board position typically provides abundant or at least non-trifle information about the value of the fully uncovered position.
- North America > United States (0.15)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Modelling Uncertainty in the Game of Go
Two players, Black and White, take turns to place stones on the intersections of an N N grid (usually N 19 but smaller boards are in use as well). All the stones of each player are identical. Players place their stones in order to create territory by occupying or surrounding areas of the board. The player with the most territory at the end of the game is the winner. A stone is captured if it has been completely surrounded (in the horizontal and vertical directions) by stones of the opponent's colour.
Concise QBF Encodings for Games on a Grid (extended version)
Shaik, Irfansha, van de Pol, Jaco
Encoding 2-player games in QBF correctly and efficiently is challenging and error-prone. To enable concise specifications and uniform encodings of games played on grid boards, like Tic-Tac-Toe, Connect-4, Domineering, Pursuer-Evader and Breakthrough, we introduce BDDL - Board-game Domain Definition Language, inspired by the success of PDDL in the planning domain. We provide an efficient translation from BDDL into QBF, encoding the existence of a winning strategy of bounded depth. Our lifted encoding treats board positions symbolically and allows concise definitions of conditions, effects and winning configurations, relative to symbolic board positions. The size of the encoding grows linearly in the input model and the considered depth. To show the feasibility of such a generic approach, we use QBF solvers to compute the critical depths of winning strategies for instances of several known games. For several games, our work provides the first QBF encoding. Unlike plan validation in SAT-based planning, validating QBF-based winning strategies is difficult. We show how to validate winning strategies using QBF certificates and interactive game play.
Monte Carlo Tree Search (MCTS) in AlphaGo Zero
In a Go game, AlphaGo Zero uses MC Tree Search to build a local policy to sample the next move. MCTS searches for possible moves and records the results in a search tree. As more searches are performed, the tree grows larger as well as its information. To make a move in Alpha-Go Zero, 1,600 searches will be computed. Then a local policy is constructed.
The cost of passing -- using deep learning AIs to expand our understanding of the ancient game of Go
Egri-Nagy, Attila, Törmänen, Antti
AI engines utilizing deep learning neural networks provide excellent tools for analyzing traditional board games. Here we are interested in gaining new insights into the ancient game of Go. For that purpose, we need to define new numerical measures based on the raw output of the engines. In this paper, we develop a numerical tool for automated move-by-move performance evaluation in a context-sensitive manner and for recognizing game features. We measure the urgency of a move by the cost of passing, which is the score value difference between the current configuration of stones and after a hypothetical pass in the same board position. Here we investigate the properties of this measure and describe some applications.